智能论文笔记

Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis

Trisha Mittal , Ritwik Sinha , Viswanathan Swaminathan , John Collomosse , Dinesh Manocha

分类：计算机视觉 | 人工智能

2022-07-26

作为内容编辑成熟的工具，以及基于人工智能（AI）综合媒体增长的算法，在线媒体上的操纵内容的存在正在增加。这种现象导致错误信息的传播，从而更需要区分“真实”和“操纵”内容。为此，我们介绍了Videosham，该数据集由826个视频（413个真实和413个操纵）组成。许多现有的DeepFake数据集专注于两种类型的面部操作 - 与另一个受试者的面部交换或更改现有面部。另一方面，Videosham包含更多样化的，上下文丰富的和以人为本的高分辨率视频，使用6种不同的空间和时间攻击组合来操纵。我们的分析表明，最新的操纵检测算法仅适用于一些特定的攻击，并且在Videosham上不能很好地扩展。我们在亚马逊机械土耳其人上进行了一项用户研究，其中1200名参与者可以区分Videosham中的真实视频和操纵视频。最后，我们更深入地研究了人类和sota-Algorithms表演的优势和劣势，以识别需要用更好的AI算法填补的差距。

translated by 谷歌翻译

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Vikas Verma , Sarthak Mittal , Wai Hoh Tang , Hieu Pham , Juho Kannala , Yoshua Bengio , Arno Solin , Kenji Kawaguchi

分类：机器学习 | 计算机视觉

2022-12-27

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

translated by 谷歌翻译

NFResNet: Multi-scale and U-shaped Networks for Deblurring

Tanish Mittal , Preyansh Agrawal , Esha Pahwa , Aarya Makwana

分类：计算机视觉

2022-12-12

Multi-Scale and U-shaped Networks are widely used in various image restoration problems, including deblurring. Keeping in mind the wide range of applications, we present a comparison of these architectures and their effects on image deblurring. We also introduce a new block called as NFResblock. It consists of a Fast Fourier Transformation layer and a series of modified Non-Linear Activation Free Blocks. Based on these architectures and additions, we introduce NFResnet and NFResnet+, which are modified multi-scale and U-Net architectures, respectively. We also use three different loss functions to train these architectures: Charbonnier Loss, Edge Loss, and Frequency Reconstruction Loss. Extensive experiments on the Deep Video Deblurring dataset, along with ablation studies for each component, have been presented in this paper. The proposed architectures achieve a considerable increase in Peak Signal to Noise (PSNR) ratio and Structural Similarity Index (SSIM) value.

translated by 谷歌翻译

DP-RAFT: A Differentially Private Recipe for Accelerated Fine-Tuning

Ashwinee Panda , Xinyu Tang , Vikash Sehwag , Saeed Mahloujifar , Prateek Mittal

分类：机器学习 | 人工智能

2022-12-08

A major direction in differentially private machine learning is differentially private fine-tuning: pretraining a model on a source of "public data" and transferring the extracted features to downstream tasks. This is an important setting because many industry deployments fine-tune publicly available feature extractors on proprietary data for downstream tasks. In this paper, we use features extracted from state-of-the-art open source models to solve benchmark tasks in computer vision and natural language processing using differentially private fine-tuning. Our key insight is that by accelerating training, we can quickly drive the model parameters to regions in parameter space where the impact of noise is minimized. In doing so, we recover the same performance as non-private fine-tuning for realistic values of epsilon in [0.01, 1.0] on benchmark image classification datasets including CIFAR100.

translated by 谷歌翻译

Nostradamus: Weathering Worth

Alapan Chaudhuri , Zeeshan Ahmed , Ashwin Rao , Shivansh Subramanian , Shreyas Pradhan , Abhishek Mittal

分类：机器学习

2022-12-08

Nostradamus, inspired by the French astrologer and reputed seer, is a detailed study exploring relations between environmental factors and changes in the stock market. In this paper, we analyze associative correlation and causation between environmental elements and stock prices based on the US financial market, global climate trends, and daily weather records to demonstrate significant relationships between climate and stock price fluctuation. Our analysis covers short and long-term rises and dips in company stock performances. Lastly, we take four natural disasters as a case study to observe their effect on the emotional state of people and their influence on the stock market.

translated by 谷歌翻译

Are Face Detection Models Biased?

Surbhi Mittal , Kartik Thakral , Puspita Majumdar , Mayank Vatsa , Richa Singh

分类：计算机视觉

2022-11-07

The presence of bias in deep models leads to unfair outcomes for certain demographic subgroups. Research in bias focuses primarily on facial recognition and attribute prediction with scarce emphasis on face detection. Existing studies consider face detection as binary classification into 'face' and 'non-face' classes. In this work, we investigate possible bias in the domain of face detection through facial region localization which is currently unexplored. Since facial region localization is an essential task for all face recognition pipelines, it is imperative to analyze the presence of such bias in popular deep models. Most existing face detection datasets lack suitable annotation for such analysis. Therefore, we web-curate the Fair Face Localization with Attributes (F2LA) dataset and manually annotate more than 10 attributes per face, including facial localization information. Utilizing the extensive annotations from F2LA, an experimental setup is designed to study the performance of four pre-trained face detectors. We observe (i) a high disparity in detection accuracies across gender and skin-tone, and (ii) interplay of confounding factors beyond demography. The F2LA data and associated annotations can be accessed at http://iab-rubric.org/index.php/F2LA.

translated by 谷歌翻译

Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement

Jiachen Yang , Ketan Mittal , Tarik Dzanic , Socratis Petrides , Brendan Keith , Brenden Petersen , Daniel Faissol , Robert Anderson

分类：机器学习 | 人工智能

2022-11-02

Adaptive mesh refinement (AMR) is necessary for efficient finite element simulations of complex physical phenomenon, as it allocates limited computational budget based on the need for higher or lower resolution, which varies over space and time. We present a novel formulation of AMR as a fully-cooperative Markov game, in which each element is an independent agent who makes refinement and de-refinement choices based on local information. We design a novel deep multi-agent reinforcement learning (MARL) algorithm called Value Decomposition Graph Network (VDGN), which solves the two core challenges that AMR poses for MARL: posthumous credit assignment due to agent creation and deletion, and unstructured observations due to the diversity of mesh geometries. For the first time, we show that MARL enables anticipatory refinement of regions that will encounter complex features at future times, thereby unlocking entirely new regions of the error-cost objective landscape that are inaccessible by traditional methods based on local error estimators. Comprehensive experiments show that VDGN policies significantly outperform error threshold-based policies in global error and cost metrics. We show that learned policies generalize to test problems with physical features, mesh geometries, and longer simulation times that were not seen in training. We also extend VDGN with multi-objective optimization capabilities to find the Pareto front of the tradeoff between cost and error.

translated by 谷歌翻译

Learning State-Aware Visual Representations from Audible Interactions

Himangi Mittal , Pedro Morgado , Unnat Jain , Abhinav Gupta

分类：计算机视觉

2022-09-27

我们提出了一种自制算法，以从以自我为中心的视频数据中学习表示形式。最近，已经做出了重大努力，以捕捉人类在日常活动中与自己的环境进行互动。结果，已经出现了几个大型的以相互作用的多模式数据的自我为中心的数据集。但是，来自视频的学习表征可能具有挑战性。首先，鉴于长期连续视频的未经保育性质，学习有效表示需要专注于互动的时间。其次，日常活动的视觉表示应对环境状态的变化敏感。但是，当前成功的多模式学习框架鼓励随着时间的推移表示代表。为了应对这些挑战，我们利用音频信号来确定有利于更好学习的可能相互作用的时刻。我们还提出了一个新颖的自我监督目标，该目标从相互作用引起的听觉状态变化中学习。我们在两个大规模的中心数据集（Epic-Kitchens-100和最近发布的EGO4D）上广泛验证了这些贡献，并显示了几个下游任务的改进，包括行动识别，长期行动预期和对象状态变化分类。

translated by 谷歌翻译

DeePhy: On Deepfake Phylogeny

Kartik Narayan , Harsh Agarwal , Kartik Thakral , Surbhi Mittal , Mayank Vatsa , Richa Singh

分类：计算机视觉

2022-09-19

DeepFake是指量身定制和合成生成的视频，这些视频现在普遍存在并大规模传播，威胁到在线可用信息的可信度。尽管现有的数据集包含不同类型的深击，但它们的生成技术各不相同，但它们并不考虑以“系统发育”方式进展。现有的深层面孔可能与另一个脸交换。可以多次执行面部交换过程，并且可以演变出最终的深层效果，以使DeepFake检测算法混淆。此外，许多数据库不提供应用的生成模型作为目标标签。模型归因通过提供有关所使用的生成模型的信息，有助于增强检测结果的解释性。为了使研究界能够解决这些问题，本文提出了Deephy，这是一种新型的DeepFake系统发育数据集，由使用三种不同的一代技术生成的5040个DeepFake视频组成。有840个曾经交换深击的视频，2520个换两次交换深击的视频和1680个换装深击的视频。使用超过30 GB的大小，使用1,352 GB累积内存的18 GPU在1100多个小时内准备了数据库。我们还使用六种DeepFake检测算法在Deephy数据集上展示了基准。结果突出了需要发展深击模型归因的研究，并将过程推广到各种深层生成技术上。该数据库可在以下网址获得：http：//iab-rubric.org/deephy-database

translated by 谷歌翻译

Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning

Jiachen T. Wang , Saeed Mahloujifar , Shouda Wang , Ruoxi Jia , Prateek Mittal

分类：机器学习

2022-09-16

提出测试释放（PTR）是一个差异隐私框架，可符合局部功能的敏感性，而不是其全球敏感性。该框架通常用于以差异性私有方式释放强大的统计数据，例如中位数或修剪平均值。尽管PTR是十年前引入的常见框架，但在诸如Robust SGD之类的应用程序中使用它，我们需要许多自适应鲁棒的查询是具有挑战性的。这主要是由于缺乏Renyi差异隐私（RDP）分析，这是一种瞬间的私人深度学习方法的基础。在这项工作中，我们概括了标准PTR，并在目标函数界定全局灵敏度时得出了第一个RDP。我们证明，与直接分析的$（\ eps，\ delta）$ -DP相比，我们的RDP绑定的PTR可以得出更严格的DP保证。我们还得出了亚采样下PTR的算法特异性隐私扩增。我们表明，我们的界限比一般的上限和接近下限的界限要紧密得多。我们的RDP界限可以为PTR的许多自适应运行的组成而更严格的隐私损失计算。作为我们的分析的应用，我们表明PTR和我们的理论结果可用于设计私人变体，用于拜占庭强大的训练算法，这些变体使用可靠的统计数据用于梯度聚集。我们对不同数据集和体系结构的标签，功能和梯度损坏的设置进行实验。我们表明，与基线相比，基于PTR的私人和强大的培训算法可显着改善该实用性。

translated by 谷歌翻译